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DETERMINATION OF MERCURY IN FISH TISSUE 
PART 1: PERFORMANCE CRITERIA 


BASED ON 32 SAMPLES ANALYZED BY 16 LABS 
IN EIGHT STUDIES BETWEEN JULY 1987 AND JUNE 1991 
DISTRIBUTED BY THE FRESHWATER INSTITUTE 
DEPARTMENT OF FISHERIES AND OCEANS 


ABSTRACT 


This report summarizes various aspects of laboratory performance based on data collected by the 
Freshwater Institute (FWI), Department of Fisheries and Oceans, Winnipeg. Laboratory 
performance criteria have been calculated for the analysis of Mercury in canned fish based on 
an evaluation of within and between laboratory estimates of standard deviation. The evaluation 
is based on a set of five laboratories, selected from among 16, who displayed the least bias and 
best precision over a series of 8 sets of 4 canned fish tissue samples distributed between July 
1987 and June 1991. Samples fell in the concentration range of 0.1 to 1.6 ppm Hg. 


The interlaboratory sample average value excluding outliers (FWI Avg), as reported by FWI for 
each sample, was used to assess group and individual laboratory performance. The within- 
laboratory standard deviation was approximately Sw = 0.008 + 3.6% FWI Avg + 0.004 ppm Hg. 
The ‘best 5° of 16 laboratories were very well controlled, with an interlaboratory long-term 
precision of approximately S = 0.004 + 4.1% FWI Avg + 0.007 ppm Hg, very similar to the in- 
laboratory estimate. Their individual average bias was less than + 0.02 ppm with an average 
intersample standard deviation of 0.02 to 0.04 ppm Hg. An added five laboratories displayed 
occasional bias of 0.02 to 0.045 ppm and poorer interstudy precision. For these “better 10’ 
facilities the interlaboratory precision was S = 0.011 + 5.9% FWI Avg + 0.017 ppm Hg. The 
remaining six laboratories displayed chronic control problems but demonstrated an underlying 
comparable ability. Individual laboratory control problems appear to stem from determinate, 
systematic, errors such as bias in stock standards, inadequate slope control, or error in 
baseline/blank settings from study to study. Based on this review, achievable performance 
criteria can be set for laboratories performing Mercury analysis on the canned fish tissue samples 
provided by FWI. Although the precision of preparation and homogeneity of the fish samples 
would be expected to contribute to overall error, its actual impact appears to be small compared 
to laboratory precision. 
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DETERMINATION OF MERCURY IN FISH TISSUE 
PART 1: PERFORMANCE CRITERIA 


BASED ON 32 SAMPLES ANALYZED BY 16 LABS 
IN EIGHT STUDIES BETWEEN JULY 1987 AND JUNE 1991 
DISTRIBUTED BY THE FRESHWATER INSTITUTE 
DEPARTMENT OF FISHERIES AND OCEANS 


INTRODUCTION 


This review was initiated to determine whether performance criteria could be set for contract 
laboratory analysis of mercury in fish. The interpretation of long-term environmental trends and 
effects requires careful and continuing control of analytical bias. The most frequent sources of 
determinate error include accuracy of standards, inadequate calibration control, laboratory 
contamination, and inappropriate corrections for method blank or baseline effects. Ability to 
maintain long-term precision that is comparable to the single-analyst within-day repeatability of 
a test is the mark of a controlled laboratory. This ability is best evaluated by the analysis of a 
series of ‘unknown’ actual matrix samples of demonstrated homogeneity and stability. 


In the early 1970’s, the Inspection Services staff at the Freshwater Institute (FWI), Department 
of Fisheries and Oceans, Winnipeg initiated a series of ‘round-robin’ studies using samples of 
canned fish prepared for this purpose. These studies have continued on a more or less regular 
4 month basis for the past two decades. For the past few years there have been 20 to 30 
participants in each study. Each analyst is requested to provide triplicate measurements on two 
different days, for each of the 4 different samples. Samples are not repeated from one study to 
the next. Laboratories participate on a voluntary basis, and are not identified by name in the 
study reports. 


The complete FWI data base covers twenty years, many samples, and a variety of laboratories. 
Even for the four years evaluated in this report, it provides a remarkable insight into long-term 
analytical control and the ability of individual laboratories to demonstrate control. FWI staff 
prepare a summary report which lists the average and standard deviation of the replicate results 
reported by each participant for each sample. They calculate an overall average and standard 
deviation for all participants. They also report a sample average (excluding outliers) and 
corresponding standard deviation for the data from those participants not identified as a statistical 
outlier. The FWI outlier procedure iteratively excludes data which differs by more than 2 
standard deviations from the current overall average. 


This review summarizes the performance of only those laboratories which provided results for 
each of 32 samples, submitted in 8 sets of 4 samples each, between July 1987 and June 1991, 
in order to examine the possibility for setting performance criteria based on the . The data from 
other laboratories which participated less frequently in these studies is not used. 


EVALUATION AND OBSERVATIONS 


Data was entered into a spreadsheet for each sample/participant based on the summary reports 
prepared by FWI staff. For most purposes the data was then evaluated relative to the average 
(excluding outliers) as reported by FWI in each study summary. 


Differences from FWI Avg: Distribution Plots (Table 1, la, Figures 1, 2, and Appendix) 


A table of residual differences (Table 1) was prepared as follows: the FWI interlaboratory 
sample average (previously calculated by FWI staff excluding outliers) was subtracted from each 
analyst’s average result. The average residual difference was determined for each laboratory 
(across samples), and corresponding standard deviations were calculated. Table 1 is sorted 
horizontally based on the average difference reported by each laboratory. 


Figure 1 indicates the level of bias associated with each laboratory, (sorted in order from negative 
to positive bias). It is clear that laboratories 01, 07, 16, 17, and 37 are least biased on average. 


Table la was prepared by sorting these residual differences according to magnitude. Figure 2 
shows the broad distribution of these differences. Although it appears to be more or less 
normally distributed, this distribution incorporates several distinct populations of laboratory 
performance. The Appendix includes diagrams showing the distribution for each laboratory. But 
it also includes line diagrams for each set of 4 samples to demonstrate that these residual 
differences frequently incorporate a significant level of either intercept or slope bias. (For further 
discussion refer to the Appendix.) 


Individual Long-term Laboratory Precision (Table 1, Figures 3, 4) 


Figure 3 shows for each laboratory the standard deviation of their residual differences. It is 
apparent that the six least biased laboratories demonstrate better long-term precision for these 8 
studies. Laboratory 3’s poorer precision prevented its inclusion among the best laboratories. 


In the following review, laboratories 01, 07, 16, 17, and 37 are discussed as the ‘best five’ labs. 
These facilities plus laboratories 03, 05, 31, 43, and 57 are discussed as the ‘better 10’ labs. 
The remaining facilities 02, 11, 19, 31, 46, and 47 have sporadic or chronic problems. The 
individual performance diagrams suggest that most of these facilities do produce acceptable data 
fairly regularly. 


Interlaboratory Sample Standard Deviations (Table 1, 2a, Figures 4, 5, 6) 


The residual differences in Table 1 were also averaged for each sample (across laboratories), and 
summarized in table 2a. Figures 4, 5, and 6, show the relationship of S vs FWI Avg for all 16, 
the ‘best 5’ and the ‘better 10’ facilities respectively. Because sample #191 (1.61 ppm) was 
somewhat isolated from the rest of the sample concentrations (next highest value was 1.15 ppm), 
it was excluded in determining the regression equation. 


Re 


Because all outliers were included in preparing figure 4, there is no clear relationship between 
S and FWI Avg. But based on the values tabulated by FWI for the S associated with their FWI 
Average (excluding outliers) sample concentration, the equation would be: 
FWI estimate: S = 0.017 +5.6% FWI Avg ppm Hg, (r=0.76) (Sy.x = 0.014) 
where: FWI Avg = interlab average concentration (excluding outliers) 


Based on the data shown in figures 5 and 6, the equations for interlaboratory reproducibility are: 
the better 10 labs: S = 0.011 +5.9% FWI Avg ppm Hg, (r = 0.73) (Sy.x = 0.017) 
the best 5 labs: S = 0.004 +4.1% FWI Avg ppm Hg, (r = 0.88) (Sy.x = 0.007) 


For the ‘best five’ laboratories, the estimate includes labs (07, 17, 37) which showed somewhat 
less long-term precision than labs 01 and 16 (see appendix). Some of the data for the 
laboratories that are being reviewed in this report were obviously excluded from the FWI Avg 
as ‘outliers’. And the FWI Avg includes data from laboratories which reported measurements 
for some but not all of the samples in the 8 studies being evaluated here, and which are therefore 
not included in this evaluation. The comparability of these equations confirms the general 
reliability of these estimates of interlaboratory precision for the purpose of setting performance 
criteria (see below). 


Within-lab Standard Deviation (Sw) (Table 3, 2b, figures 7, 8, 9, 10) 


Table 3 records the individual estimates of Sw derived from the replicate measurements reported 
by each analyst for each sample as summarized by FWI staff. The average Sw per lab is 
summarized in Table 2b and figure 7. This information provides an estimate of the best precision 
possible. As expected, it is reasonably constant (at about 0.02 ppm) for most laboratories. Even 
the ‘less-controlled’ laboratories demonstrate acceptable repeatability on a per sample basis (labs 
55 and 05 show the poorest repeatability). 


The dependence of Sw on sample concentration is shown in figures 8, 9, and 10. variability, is 
summarized below (based on the data in table 2b) for all 16 labs, the ‘best 5’, and the ‘better 10’ 
respectively. For the ‘best 5’ laboratories, the average interlaboratory Sw is barely dependent 
of the sample concentration. Because sample #191 (1.61 ppm) was somewhat isolated from the 
rest of the sample concentrations (next highest value was 1.15 ppm), it was excluded in 
determining the regression equation. Based on linear regression, the equation for interlaboratory 
repeatability varies only slightly, if at all, depending upon the particular labs included. 


for all 16 labs: Sw = 0.008 + 3.6% FWI Avg + 0.004 ppm Hg, 
the better 10 labs: Sw = 0.011 + 2.7% FWI Avg + 0.005 ppm Hg, 
the best 5 labs: Sw = 0.010 + 2.0% FWI Avg + 0.004 ppm Hg, 
where: FWI Avg = interlab average concentration (excluding outliers) 


These equations apply in the range 0.09 to 1.2 ppm Hg. These estimates of Sw will be 
somewhat affected by the fact that the individual estimates of Sw per lab\sample were reported 
by FWI to only one significant figure. 


ihe 
Performance Criteria (Table 4, figures 11, 12) 


Based on these observations, summary plots were prepared with the data from the ‘best 5’ 
facilities. Table 4 lists the median value reported per sample by the ‘best 5’ labs. It also 
tabulates the standard deviation of the data reported by these laboratories for each sample, and 
calculates the individual differences per sample for each laboratory relative to their median value. 


Figure 11 plots the FWI values versus the median for the “best 5’ laboratories. Figure 12 shows 
the dependence of S for the ‘best 5’ versus concentration, scaled so that it can be overlaid as the 
‘upper control limit’ on figure 13. The following observations are noteworthy: 


1) The medians of the best 5 laboratories are in close agreement with the FWI Avg. 
(FWI Avg - Median) = -0.001 + 2.5% Median 


The correlation coefficient R = 0.387 for this residuals equation indicates that the 
residuals slope of 2.5% is significantly different from zero. At an alpha of 0.05 the 
critical value for R is 0.35 for 30 degrees of freedom, (i.e., for this data set, over the 
long-term, the FWI Avg is slightly biased relative to the Median). 


2) The between-lab standard deviation for the ‘best 5’ labs versus the Median is: 


S = 0.004 + 4.2% Median 
The equation for these same labs versus the FWI Avg. was: 
S = 0.004 + 4.1% FWI Avg. + 0.007 
3) These findings suggest that a performance Warning Limit and Control Limit could be set 
at: Wie Se Median + (0.008 + 0.08 FWI Avg). 
CE = Median + (0.012 + 0.13 FWI Avg). 


Figure 13 shows the individual data points for the ‘best five’ labs plotted versus their median 
value, and Warning and Control Limit lines have been drawn approximating these equations. In 
estimating these performance criteria we include only the variability of the ‘best 5’ data about 
their own median. This reduces the impact of occasional ‘outliers’ in this selected data set, and 
ensures an internally consistent reference point for determining the dependence of ‘difference’ 
on concentration. 


The individual performance plots in the appendix tend to substantiate some risk of bias in any 
given FWI Average. This occurs because ‘outliers’ are only detected if there is sufficient ‘good’ 
data. A bias can be induced by the inclusion of apparently acceptable data from laboratories with 
demonstrated poorer control. Since an average is easily affected by undetected outliers, the 
median is a preferred basis for determining a consensus value among laboratories, provided there 
is a reasonable degree of central tendency in the data. But FWI Avg could be substituted, with 
due precautions, in the above CL and WL equations without affecting conclusion significantly. 


SUMMARY 


The Mercury in Fish Interlaboratory Study database, developed from the individual reports of 
FWI, has provided an excellent long-term overview of performance and control status. It is clear 
that most laboratories are capable of controlled performance. Laboratory performance criteria 
have been calculated for the analysis of Mercury in canned fish tissue based on an evaluation of 
within and between laboratory estimates of standard deviation. This evaluation is based on a 
limited set of five laboratories, selected from among 16, who displayed the least bias and best 
precision over a series of 8 sets of 4 canned fish tissue samples distributed between July 1987 
and June 1991. The individual diagrams in the appendix confirm the general achievability of the 
performance criteria based on the data from the ‘best 5’ laboratories. A separate paper will 
address the performance of all laboratories that have participated in approximately 50 such 
studies over the past 20 years. It is possible that the lack of a specific performance criteria may 
have had an adverse effect on some environmental databases at some times in the past. 


Note that the FWI average corrected for outliers does not necessarily reflect accuracy. It is a 
consensus value which may be biased. But it is interesting that the distribution of differences 
is tightest for those who are close to this consensus value. These facilities are both precise and 
controlled, their individual working standards are in good agreement, and their sample preparation 
procedures apparently provide consistent recovery. 


In controlled laboratories long-term precision is not particularly dependent on concentration. For 
the best laboratories it may be almost independent of concentration. The variability induced by 
sample preparation (concentration-independent) tends to dominate in controlled laboratories, 
whereas the variation in re-setting slope factors, or in preparing and using standards, 
(concentration-dependent) tends to dominate in less-well controlled laboratories. Several of the 
facilities that have contributed data to these FWI round-robins could benefit from an internal 
review of the accuracy of their stock standards; many could benefit from a more stringent control 
program for their standard response factor (slope); and some seem to have occasional problems 
with baseline/blank corrections. But this review also suggests that these analysts (and by 
inference the others whose data was not used in this evaluation) have the capability to perform 
equally well. 


Based on the analysis of canned fish tissue as provided by FWI, these findings suggest that a 
performance Warning Limits (WL) and Control Limits (CL) could be set at: 

WL = Median + (0.008 + 0.08 FWI Avg). 

CL = Median + (0.012 + 0.13 FWI Avg). 
Note that these criteria are based on the average of 6 replicate measurements (two separate sets 
of triplicates). They include allowance for interlaboratory and between-day variability. They 
should be considered for establishing in-laboratory control by those wishing to ensure that their 
data quality will meet a standard set by their peers. 


The consistent reliability of the fish samples provided by FWI ensures a reliable, and reasonable, 
basis for decisions about control status. It may be possible to review (decrease) the number of 


AG. 


samples submitted per study and the number of replicates required per sample and yet retain the 
ability to flag potential analytical control problems. The average repeatability of the replicate 
measurements appears to be adequate compared to the variability induced by inadequate control. 


FWI Summary Reports round-off the averages and standard deviations of the individual 
laboratory replicates to the nearest 0.01 ppm Hg. The presence of an additional significant figure 
would have improved the quality of the estimates of precision and bias attempted in this review. 
The effect of round-off is seen in the distribution of differences plots as a deficiency of odd- 
valued differences or gaps. The presence of an additional significant digit would have provided 
smoother distributions and better estimates of bias and variability. When applying the Warning 
and Control Limits suggested from this evaluation it would be advisable to record the average 
per sample to the nearest 0.001 ppm Hg. 


If one wishes to monitor small trends in the mercury content of fish over extended periods of 
time it may be advisable to require replicate analysis of at least some fish samples, regular 
verification of working mercury standards and calibration factors, and careful evaluation of 
sources of baseline or blank related determinate error. 
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FWI Study Reports used in this Evaluation 


SAMPLE SET DATE SAMPLES SENT DATE OF REPORT 
180-183 17 Jul., 1987 29 Oct., 1987 
184-187 8 Jan., 1988 16 May, 1988 
188-191 27 Jun., 1988 14 Oct., 1988 
192-195 23 Dec., 1988 20 Apr., 1989 
200-203 13 Oct., 1989 30 Jan., 1990 
204-207 16 Apr., 1990 27 Jun., 1990 
208-211 8 Aug., 1990 11 Dec., 1990 
216-219 Jun., 1991 28 Aug., 1991 


Note: sample analyses were to be performed on a specified date about 1 month after date 
submitted. 


TABLE 1: 


TABLE la: 


TABLE 2a: 


TABLE 2b: 


TABLE 3: 


TABLE 4: 


TABLE 5: 
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TABLES AND DIAGRAMS 


DIFFERENCES BETWEEN LAB RESULTS AND THE FWI AVERAGE 


DISTRIBUTION OF DIFFERENCES FROM TABLE 1 PER LAB 
Plus Summary of Distribution: Best 5, Better 10, Overall 


SUMMARY PER SAMPLE FROM TABLE 1 (Avg.Diff & Std.Dev of Diff) 
+ Summary of Interlab Std.Dev: (Best 5, Better 10, All 16 labs) 


SUMMARY PER SAMPLE FROM TABLE 2 (Avg. Sw & Std.Dev of Sw) 
Plus Summary of In-Lab Std.Dev: Best 5, Better 10, Overall 


IN-LAB Sw ESTIMATES (AS REPORTED IN FWI STUDY SUMMARIES) 
PERFORMANCE CHARACTERISTICS (DATA FROM BEST 5 LABS) 
INDIVIDUAL PERFORMANCE VS CONTROL AND WARNING LIMITS 
AVERAGE DIFFERENCE FROM FWI Avg (32 samples per lab) 
DISTRIBUTION OF DIFFERENCES ABOUT FWI Avg (all 16 labs, 32 samples) 


STANDARD DEVIATION OF DIFFERENCES (32 samples per lab) 
INTERLAB SD vs FWI Avg (All 16 labs, per sample) 


INTERLAB SD vs FWI Avg (Best 5 labs, per sample) 
INTERLAB SD vs FWI Avg (Better 10 labs, per sample) 


AVERAGE IN-LAB STD DEV (Sw) (Summarized by FWI for each sample/lab) 
WITHIN-LAB Sw vs FWI Avg (All 16 labs, per sample) 


WITHIN-LAB Sw vs FWI Avg (Best 5 labs, per sample) 
WITHIN-LAB Sw vs FWI Avg (All 16 labs, per sample) 


COMPARISON OF FWI AVERAGE vs MEDIAN (Best 5 labs) 
INTERLAB STD DEV vs MEDIAN (Best 5 labs) 


WARNING AND CONTROL LIMITS FOR DIFFERENCE FROM MEDIAN 
Plotted values are the Best 5 labs’ differences from medians 
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13: WARNING AND CONTROL LIMITS 
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APPENDIX: 
INDIVIDUAL LABORATORY PERFORMANCE DIAGRAMS 


Individual Laboratory Performance able la 


Table 1a was prepared to examine the distribution of differences for various laboratories and combinations 
of labs. Distribution plots were prepared for each laboratory. For some laboratories the differences from 
the sample average (excluding outliers as calculated by FWI staff) are quite tightly clustered, and some 
are more or less centred about a zero difference from this FWI Average value. In order to achieve this 
pattern, (for example lab #16) the analyst must have a good within-lab single-analyst repeatability, and 
must have maintained good control over sources of between-run biases including blank/baseline and 
calibration/slope corrections over the time frame of these eight studies (a period of four years). 


Because the sample concentrations range from 0.1 to 1.6 ppm Hg, the data in table 1 was also used to 
examine any dependence of difference on sample concentration. These appear as the bottom diagram in 
each of the individual laboratory performance figures. Because each study involves four samples, the 
respective points were joined in order of increasing concentration for each study. In this type of residuals 
diagram, each set of four points should fit a reasonably straight line: fluctuation reflects repeatability or 
individual sample measurement problems. The intercept should be zero difference, and the slope should 
be zero: otherwise one suspects determinate error in calibration. 


Note that the distribution diagram often presents an impression of normality. Therefore data points which 
differ because of a blank or slope related bias (as shown in the lower diagram of difference versus 
concentration) would generally not be detected as an outlier based on the usual tests for outliers. Data 
that is out-of-control cannot be used to determine effective control limits. 


Individual Performance Evaluation 


Table 5 shows the outcome of testing each laboratory result for difference from the FWI Avg. relative to 
the Waming or Control Limits specified above. Results beyond the control limits are identified as ‘HIGH’ 
or ‘LOW’. Values beyond the Waming Limits are indicated ‘high’ or ‘low’. The individual performance 
diagrams in the appendix also help to indicate the nature of any control problems. The following 
examples will assist the reader in investigating causes for the observed performance of individual 
laboratories. 


Labs 01, 07, 16, 17, and 37 set the standard for long-term performance. Lab 01 has two atypical results. 
Lab 07 has 3 atypical results. Lab 17 is biased low in sets #108-187. 


Lab 02 has had a continuing slope bias. The stock standard may be too strong. 
Lab 03 had severe positive slope bias in set #180-184 and set #204-207. 


Lab 05 had one severe blank/baseline problem for samples #192-195. The remaining data is reasonable. 
There is room for improved slope control. 


Lab 11 has sporadic control problems including both intercept and slope biases. Sample sets #188-199 
and #204-209 are particulary notable. 


Lab 19 was biased low in sets #204-219. Slope control could be improved. 

Lab 31 slope control could be improved. 

Lab 43 data is well controlled but biased high on average, stock standard may be too weak? 

Lab 46 has a severe slope and intercept control problem. 

Lab 47 had a slope problem for set #188-191, and some erratic values. Rest of data is well controlled. 
Lab 55 stock standard may be too strong. Bias obscured by significant variability in slope control. There 
may be a variable but generally low intercept problem (over-correction or inadequate control of 
baseline/blank?). 


Lab 57 had (blank?) problems in samples #180-188. Otherwise control is good. 
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